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Abstract 

The minimum hypergeometric test (mHG) is a powerful nonparametric 
hypothesis test to detect enrichment in ranked binary lists. Here, I provide 
a detailed review of its definition, as well as the algorithms used in its 
implementation, which enable the efficient computation of an exact p- 
value. I then introduce a generalization of the mHG, termed XL-mHG, 
which provides additional control over the type of enrichment tested, and 
describe the precise algorithmic modifications necessary to compute its 
test statistic and p-value. The XL-mHG algorithm is a building block of 
GO-PCA, a recently proposed method for the exploratory analysis of gene 
expression data using prior knowledge. 
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1 Introduction 


On the face of it, the minimum hypergeometric test, or simply mHG, is a non- 
parametric hypothesis test to detect enrichment in ranked binary lists [1]. This 
may sound more exotic than it actually is, since what the mHG essentially 
provides is a powerful way of testing for a (directed) association between one 
continuous and one binary variable, while making very few assumptions about 
the distributional properties of either variable (for a more detailed discussion 
of the mHG as a test for association, see Section 2.5). The mHG is therefore 
very generally applicable. So far, however, the mHG has mostly been applied to 
biological problems, e.g. the detection of DNA sequence motifs in transcription 
factor binding sites [1], or the detection of enriched Gene Ontology (GO) terms 
in ranked lists of genes [2]. 

The XL-mHG is an extension of the mHG that introduces two parameters {X 
and L), which are designed to provide additional control over the minimal subset 
size that can constitute enrichment (X), and the part of the list that is to be 
tested for enrichment (L). Depending on the application, these parameters can 
help to significantly increase the specificity of the test. This report provides a 
detailed description of how to efficiently implement the mHG, and its extension, 
the XL-mHG, thus allowing for a single test to be performed in milliseconds, 
even for lists containing thousands of elements. 

This manuscript is organized as follows: Section 2 provides a review of the 
mHG for readers that are not familiar with it, assuming no background knowl¬ 
edge other than a familiarity with basic concepts of probability theory. Section 3 
provides a detailed discussion of how to efficiently implement the mHG. This 
discussion is based entirely on ideas developed by Dr. Zohar Yakhini and col¬ 
leagues [1]. Section 4 introduces an extension of the mHG, termed XL-mHG, 
which was designed to provide additional control over the type of enrichment 
that is tested for. This extension was developed by me, and is used in a recently 
proposed biological application for knowledge-based unsupervised analysis of 
heterogeneous expression data [3]. Section 4 also includes a detailed discus¬ 
sion of how to modify the efficient mHG algorithms to calculate XL-mHG test 
statistics and their associated p-values. Certain algorithms and derivations are 
provided in the Appendix. A free and open-source Cython implementation of 
the XL-mHG can be found at https://github.com/flo-compbio/xlmhg. 
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2 The minimum hypergeometric (mHG) test 

The mHG is an enrichment test for ranked binary lists that was developed by Dr. 
Zohar Yakhini and colleagues [1]. This section serves as a review of the mHG 
for readers that are not familiar with it. I first introduce the representation 
of ranked binary lists as binary vectors in Section 2.1. Then, in Section 2.2, I 
describe a simpler enrichment test for such lists, and demonstrate its application 
on a toy example. The discussion of this simple test serves multiple purposes: 
First, the simple test is directly related to the mHG through its reliance on 
the hypergeometric distribution, and almost all of the notation and concepts 
introduced in this section serve as the basis for my later discussion of the mHG. 
Second, I highlight the fact that the simple test suffers from a major drawback, 
which the mHG was specifically designed to overcome. The discussion of the 
simple test therefore provides a strong motivation for the following discussion of 
the mHG. A discussion of how to efficiently implement the mHG is postponed 
to Section 3. 

2.1 Representing a ranked binary list as a vector 

We represent the ranked binary list we are talking about as a vector v of length 
N, with entries of only zeros and ones: 


V = {vi,V2,.. ■ , Vn)'^, Vi G {0, 1} 

We refer to individual entries in this list as “elements” (adopting the termi¬ 
nology used for vectors). We refer to the set of all elements for which Vi = 0 
as “the O’s”, and to the set of all other elements as “the I’s”. We also say that 
Vi represents the “topmost” element, and V]\[ the “bottommost” element of the 
list. We further let K and W denote the total number of I’s and O’s in the list, 
respectively {K + W = N): 


N 

N 

W = = 0]=N-K 

i=l 

Here, 1[] denotes the indicator function. We can think of the I’s as repre¬ 
senting the elements with an “interesting” feature in the list. For example, if we 
are dealing with a list of genes, the genes represented by I’s might represent all 
genes that are known to play a role in DNA replication. Typically, the number 
of I’s is smaller (and sometimes much smaller) than the number of O’s (AT < W 
OT K ^W). 
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2.2 A simple test for enrichment (with a major drawback) 

For demonstration purposes, we will assume that we are given a particular vector 
Uexj representing a ranked binary list (as explained above). The vector looks as 
follows: 

= ( 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 )^ 

Using the notation introduced above, we have N = 20, K = 5, and W = 
20 — 5 = 15. We are interested in whether there is an enrichment of I’s “at 
the top of the list”. Of course, in order to be able to provide a quantitative 
answer to this question, we need to first define what exactly we mean when we 
say “enrichment at the top of the list”. 

One possibility is to directly define “the top of the list” by introducing an 
integer “cutoff” parameter n {0<n<N), indicating that “the top of the list” 
consists of the first n elements of v. To quantify enrichment, we would then 
calculate a test statistic fc(„), representing the number of I’s we observe among 
the first n elements: 


i=l 

Under the null hypothesis of no enrichment, we assume that the O’s and I’s are 
randomly distributed in the list (in other words, we assume that all permutations 
of V are equally likely). We can then use the hypergeometric distribution to 
assess the statistical significance of observing a certain 


The hypergeometric distribution 

A random variable has the hypergeometric distribution if it represents 
the number k of “interesting” items (here, the “I’s” in the list) among 
a sample of size n, drawn without replacement from a population of N 
items, of which K are considered “interesting”. The probability mass 
function (PMF) of the hypergeometric distribution is defined as follows: 


f{k-, N,K,n) 



(Hypergeometric PMF) 


We use / to represent the hyergeometric PMF throughout this docu¬ 
ment. In the PMF, ())) denotes the binomial coefficient (read as “a 
choose b”). It represents the number of ways in which we can select 
b elements from a total of a elements (6 < a), when we ignore the or¬ 
der in which we choose the elements. The binomial coefficient can be 
calculated as follows: 


a\ a\ 

b) ^ b\(a-b)\ 
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To assess whether A:(„) is statistically significant, we need to calculate its 
associated p-value . This p-value is defined as the probability of observing 

fc(„) or more I’s among the first n items in the list, under the assumption that 
the I’s and O’s are randomly distributed. This probability can be calculated as 
the “tail” of the hypergeometric distribution: 


min(n,/<’) 

C)= E (1) 

= 1 - T’(fc(„) - 1; N,K,n) 

= Sik(^n) - 1; N,K,n) 

Here, F and S denote the cumulative density function (CDF) and the survival 
function (SF) of the hypergeometric distribution, respectively: 

k 

F{k; N, K,n) = '^ f{i; N, K, n) 

min(n,K’) 

S{k;N,K,n)= ^ f{i;N,K,n) 

For almost every commonly used programming framework, there are publicly 
available packages that provide functions for evaluating /, F, and/or S. For ex¬ 
ample, in Python, the SciPy package offers the functions stats .hypergeom. sf 
for evaluating S. It is therefore straightforward to implement this test in most 
circumstances. Given a specific n, we can now calculate a p-value for 
our test statistic fc(„). Since the length of Dgx is = 20, we might choose 
n = N/A = 5, corresponding to the top 25% of elements, as a reasonable choice 
for the “top of the list”. We then have: 

5 

*(5) =E^[^* = 1] =S 

i=l 

= s'(3 - 1, 20, 5,5) « 0.073 

The result of this calculation shows that at the conventional significance level of 
a = 0.05, we can’t reject the null hypothesis of no enrichment (since > a). 
This result might seem counter-intuitive: Doesn’t the distribution of I’s in our 
vector look highly skewed towards the top (except for one outlier at the bottom)? 
Here is again, with the first 5 elements visually separated to indicate the 
“top of the list” for our cutoff of n = 5: 

Uex = (1,0,1,1,0,1 1,0,0,0,0,0,0,0,0,0,0,0,0,1,0)^ 
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The reader may have noticed that the element at our chosen cutoff of n = 5 
is a 0 located right “in between” two I’s. What if we had instead chosen n = 4, 
excluding the 0, or n = 6, including another 1? We would then have: 


What these results show is that, had we chosen either n = 4 or n = 6, we 
actually would have been able to reject our null hypothesis of no enrichment! 
This example illustrates a major drawback of our simple enrichment test: 
The choice of n strongly affects the outcome of the test. This is a big problem 
in practice, because we often do not have a way to determine the “best” n to 
use. If we simply choose n arbitrarily (e.g., n = iV/4, as in our example), then 
there will be situations where we choose n too small, meaning that we miss a 
surprising accumulation of I’s that occurs relatively high in the list, but below 
the n’th rank. But we also do not want to choose n too large, since that could 
result in an insignificant p-value when there actually are a surprising number of 
I’s concentrated at the very top of the list. 

What about other ideas for choosing nl Why not try to first look at v, 
and choose an n that appears to “capture” a large number of I’s? In our toy 
example, this approach seemed to “work wonders”: A blind choice of n = 5 
did not result in a significant test, but a “data-driven” choice of n = 4 or 
n = 6 did! While this idea might seem attractive at first, it is actually very 
problematic, since the choice of n using this method is subjective. Suppose two 
people “eyeball” different n’s, resulting in one significant and one insignificant 
test. Then, the question of whether there is or isn’t statistically significant 
enrichment would fundamentally come down to a question over who has the 
better judgement. Clearly, this would not be a scientifically sound method. A 
statistical way of describing essentially the same problem is to say that choosing 
n after taking a “peek” at the data introduces a significant amount of (positive) 
bias to the test, so that the p-values obtained would overstate the statistical 
significance of the observed pattern. Moreover, since the choice is subjective, it 
would be impossible to quantify the extent of this bias, making the test all but 
statistically useless. 

Finally, we could decide to try a few different n’s (without looking at the 
data), and see if any of those tests come back significant. However, this would 
then constitute multiple testing, which we would then have to account for (rais¬ 
ing new problems). In short, none of these ideas provide a useful approach for 
the criticial choice of n when it is not known a priori. 

2.3 mHG: A nonparametric test for enrichment 

In Section 2.2, we discussed an enrichment test for ranked binary lists that is 
simple to implement, but suffers from the major drawback of requiring upfront 
knowledge of n, the parameter that defines what part of the list should be 
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considered “the top”. The mHG provides a very elegant solution to this problem, 
by giving up on the idea of defining “the top of the list” altogether. The mHG 
gets rid of the n parameter, and replaces it with — nothing. The mHG does 
not require any parameters; in this sense, it is a fully nonparametric test. In 
cases where n is unknown (i.e., most of the time), this property provides the 
mHG with a huge advantage over the simple enrichment test described above. 
However, to achieve this advantage, the mHG relies on a more complex testing 
procedure, which requires many “sub-tests” to be performed. This in turn 
introduces a multiple testing problem^. Fortunately, it turns out that these 
issues can be resolved very efficiently. 

The mHG method consists of two components: The first component is the 
definition of the mHG test statistic For the simple enrichment test 

described above, the test statistic was simply /c(rt), the number of I’s among the 
first n elements of the list. However, without a fixed cutoff n, this statistic obvi¬ 
ously does not apply. Instead, is defined as the minimum hypergeometric 
p-value Ppfp taken over all possible cutojfs (see Fig. 1): 

^mHG _ (mHG test statistic) 

Note that due to this definition, smaller values of represent stronger en¬ 
richment. 

The second component of the mHG is a way of efficiently calculating an 
exact p-value for The null model is still the same as before: We 

assume that when there is no enrichment, all permutations of v are equally 
likely. However, we have no closed form solution for the distribution of 
under this model. Moreover, there exist an astronomically large number of 
permutations, even for moderately-sized lists (say, N=100). This means that 
we cannot calculate a p-value by simply enumerating all those permutations 
and calculating their test statistic. Therefore, the exact calculation of 
relies on a dynamic programming approach for path counting [1] (discussed in 
Section 3.2). 

In addition to describing how to calculate exactly and in polynomial 

time, Eden et al. also derived a useful upper bound for known as the 

Lipson bound [1]. Its form is surprisingly simple: 

pmHG < (Lipson bound) 

The argument used to derive this bound relies on the fact that even though 
gmHG jg defined as the minimization over the p-values of n hypergeometric tests, 
the mHG test can be shown to be equivalent to testing no more than K distinct 
“key” cutoffs one for each k G {1.. .K}. We typically do not know or care 

^For a single hypothesis test, its associated p-value represents the probability of rejecting 
the null hypothesis when it is in fact true. The multiple testing problem refers to the fact 
that when many hypothesis tests are performed simultaneously, their individual p-values no 
longer represent that probability — with enough “tries”, we always expect to obtain some 
“significant” p-values, even if all hypotheses tested are truly null. This means that the testing 
procedure must somehow be corrected for the fact that many individual tests were performed. 





Figure 1: Calculation of the mHG test statistic for the ranked binary list 
Uex (the example used in the text). Each bar represents the hypergeometric p- 
value at a given cutoff n. Bars highlighted in yellow correspond to positions 
in the list that have a “ 1 ” element. 3 ““° corresponds to the smallest 
(represented here by the largest bar, due to the negative log scale). In this 
example, 5 ”“° occurs at a cutoff of n = 6 (where /c(„) = 4). Note that for each 
cutoff where the list contains a “ 0 ”, is always larger thanp“°_^j (represented 
by a smaller bar). We can therefore skip the calculation of for those n when 
we calculate 


about the exact values of these n^, but we can still use this fact in order to 
derive the bound. 

2.4 The notion of enrichment behind the mHG test 

What notion of enrichment does the definition of the mHG test statistic translate 
to? At first glance, this might seem like an obvious question, since the mHG 
test statistic is indeed just the minimization of the simple hypergeometric test’s 
p-value p^^-^ over all possible cutoffs n, and the notion of enrichment underlying 
the simple hypergeometric test is easily described: The test asks, “Is there a 
surprisingly large number of I’s above the cutoff n?” A key property of the test 
is that it does not take into account the exact distribution of I’s and O’s above 
and below the cutoff, only their counts. For example, some I’s could be located 
at the very bottom of the list, yet the significance of the test would be just the 
same if those I’s were located right below the cutoff. This makes the test robust 
to (negative) outliers, and is in contrast to other nonparametric tests, such as 
the Mann-Whitney U test, which operate on the precise ranks of the I’s and 
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O’s. However, even in less extreme cases, it is important to understand that the 
behavior of a subset of I’s (i.e. their location in the list) can lead to a positive 
test result, while the behavior of the remaining I’s is essentially ignored. In 
particular, when n is small in relation to the length of the list, the subset of I’s 
responsible for a positive test result can represent only a small fraction of the 
total number of I’s. In other words, for relatively small n, a positive test result 
can be based on a small fraction of “interesting” items that are located at the 
very top of the list. In other situations, when n is large, a positive test can have 
the opposite interpretation, as it can be based on a slight overrepresentation of 
I’s above the cutoff n. 

The point of this discussion is to emphasize that the type of enrichment de¬ 
tected by the simple hypergeometric test can vary significantly, depending on the 
choice of the cutoff parameter n. However, every single mHG test considers all 
possible n simultaneously. What does this mean for the notion of enrichment 
underlying the mHG? One way of addressing this question from an intuitive 
standpoint is to imagine that the mHG inherits all aforementioned (and contra¬ 
dictory) behaviors of the simple hypergeometric test, while the choice of which 
of these behaviors is “expressed” in any specific case depends on the tested list 
itself:^ For lists where a small subset of I’s is located at the very top, the mHG 
will likely base its enrichment on this subset; for lists with only a slight over¬ 
representation of I’s among, say, the first half of the list, the mHG will detect 
enrichment based on that pattern. One could therefore say that the notion of 
enrichment behind the mHG is “data-dependent”. This of course is precisely 
why the mHG is so useful: It “adapts” the general notion of enrichment to 
the specific situation encountered. In Section 4, I introduce two parameters 
which, abstractly speaking, provide additional control over how much the mHG 
is allowed to adapt its notion of enrichment. 

2.5 Testing for directed association 

Before moving on to a description of how to efficiently implement the mHG, 
I would like to emphasize that besides testing for enrichment the purpose of 
the mHG can be more generally understood as testing for association between 
one continuous and one binary feature: The continuous feature is used to rank 
all items, and the binary feature marks the “interesting” items (the I’s), as 
discussed above. Moreover, this test is directional: The mHG only tests for 
enrichment at the top of the list, not at the bottom. Therefore, an accumulation 
of I’s at the bottom of the list does not result in a significant test. Of course, 
enrichment at the bottom can be tested separately, by simply inverting the list. 

^The reader may excuse my use of a biological term in this context. 
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3 Efficient implementation of the mHG test 

In this section, I provide a detailed description of how to efficiently calculate 
the mHG test statistic and its associated p-value, for any ranked binary list. 
The ideas for this implementation were developed by Dr. Zohar Yakhini and 
colleagues [1]. I present them here in full detail, which then allows me (in Sec¬ 
tion 4) to precisely describe the modifications required to accomodate the two 
parameters introduced by the XL-mHG test statistic. 

3.1 Calculating the mHG test statistic 

The mHG test statistic is defined as the minimum hypergeometric p-value 
P^n)^ taken over all possible cutoffs. In principle, the calculation of is 

therefore very simple: 


g„HG ^ Q 

for all n do 
Galculate 

smHG ^ min{p“^j, 

end for 
return 5 ”“^ 


However, for large N, calculating the values of all the individually is 
relatively slow. To calculate 5 ”“^ efficiently, we can rely on two key obser¬ 
vations: First, we know that the smallest p^^-^ will never occur at a cutoff n for 
which Vn = 0. We can therefore skip the calculation of p^^^ for all “0” elements, 
which leads to a significant speed-up when K N. (For similar reaons, we 
can also skip the calculation of when Vn+i = 1 , which leads to significant 
speed-up when there are long stretches of consecutive I’s in the list.) Second, 
even when we hit a 1, we can avoid calculating p^^-^ “from scratch”. Instead, 
we can exploit the fact that N and K remain constant throughout the proce¬ 
dure, and use a recursive approach to efficiently calculate all the for which 
= 1. This approach consists of two sub-algorithms: Algorithm 4 calculates 
Pfn) from N, K, n) in 0{K), and Algorithm 5 calculates f{k(^n)] N, K, n) 

for all n in 0{N). Algorithm 6 combines these sub-routines to calculate 5 ”“^ 
in 0{KN). 

3.2 Calculating the mHG p-value 

Snppose that we have calculated the value of 5 ”“^ j-qj- ranked binary list v 
with N elements and K I’s. How is its associated p-value defined? Recall 
that under the null model, all permutations of v are assumed to be equally 
likely. Therefore, let A”®®’® be a random variable that represents the mHG test 
statistic obtained for a random permutation of v. The p-value associated 
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with is then defined as the probability of observing an that is at 

least as good (i.e., smaller than or equal to, see Section 3.1) as 

pmHG ^ Pi.(5'»hg,o < g„HG) p-value) 

Let represent the set of all possible permutations of v (including 

V itself), i.e., all ranked binary lists with N elements and exactly K I’s. Then, 
let be a random variably representing a list randomly drawn from 
Using the fundamental bridge^, we can then re-express as follows: 

p„HG ^ E ( 1 [s»HG.O < ^„HG I yO] ^ 

= ( H ^ / I I 

This would then suggest the following (naive) algorithm: 


p •(— 0 

for each G yi^yN) 

Calculate (the mHG test statistic for v^) 

if s"'Hc:.o < g-iHG tjjen 

p ^ p -I- 1 

end if 
end for 

return p'"“'^ = P / | | 


Unfortunately, the number of lists in grows incredibly quickly: 

' \kJ K\{N-Ky. 

For example, « 5.4 x 10^°. In this case, we would therefore have to 

calculate more than 10^° (!) different in order to calculate which 

shows that this approach is completely infeasible, except for very short lists. 

Instead, the efficient calculation of p^^® relies on the idea of path counting 
[1]. To understand this idea, let us first take a step back and again look at the 
definition of 


= minp|(f) 


We also saw (in Section 2.2) that each Ppfp in turn, can be calculated as: 

Ppd = - 1; N,K,n) 

term used by Harvard Professor Joe Blitzstein for the relationship between prob¬ 
abilities and expectations of indicator random variables, see http://www.quora.coiii/ 

What-are-the-top- 10-big-ideas-in-Statistics-110-Introduction-to-Probability-at-Harvard 
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This shows that the value of each depends on exactly four parameters: n, 
k{n), N, and K. Note that all share the same N and K. Therefore, the only 
parameters that vary during the calculations of their are n, the cutoff, 

and fc(„), the number of I’s above the cutoff. How many unique parameter 
combinations are there? We know that 0 < A:(„) < n and 0 < n < N. 

Therefore, there are less than {K + 1) x {N + 1) unique combinations. This 
leads us to a surprising observation: Despite the fact that there are more than 
10^° unique in the calculations of all of their mHG test statistics 

gmHG,o (depend on less than 21 x 101 = 2121 unique values for 



Figure 2: The set of all hypergeometric configurations, a A {K + 1) x 

(TV + 1) grid showing as the blue shaded area. Each G can be 

represented as a unique path through The path for the example vector 

V* is shown in blue. The set TZ of all configurations with an mHG test statistic 
as good or better than that of v* is shown in red. Note that the points within 
the white areas do not represent valid configurations (cf. Figure 7 in [1]). b A 
more compact {K + 1) x {W + 1) grid for representing There exists 

a 1 -to-l mapping between the conhgurations in a and the configurations 

M(k,w) in b. 

Since a parameter combination (fc, n) uniquely determines the value of the 
hypergeometric p-value (assuming constant AT and JV) , we refer to it as a hyper- 
geometric configuration p'fj. . We can then define the set of all hypergeometric 
conhgurations :0<fc<n, 0<n<TV}. We can visualize 

this set using a {K + 1) x {N + 1) grid (see Fig. 2a). However, we can sim¬ 
plify our indexing with W = N — K and w = n — k. {W represents the total 
number of O’s in the list, and w the number of O’s above the cutoff n.) We can 
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then use k and w to define hypergeometric configurations allowing us to 

equivalently define as follows: 

■.Q<k<K,Q<w <W} 

The fact that both definitions of M. describe the same set of hyperge¬ 
ometric configurations should be obvious by comparing Fig. 2a to Fig. 2b. We 
can now easily calculate | = (20-1-1) x ((100 — 20) +1) = 1701. There¬ 

fore, there are exactly 1701 unique hypergeometric configurations involved in 
the calculations of the for all the in For each configuration 

Mfe.iu, let P(k,w) represent its associated hypergeometric p-value: 

P{k,w} = s{k - 1; N,K,k + w) 

We can then define the set TZ of all configurations with a hypergeometric p-value 
at least as good as 


^ = {P-ik,w) ■■ P(k,w) < 

Importantly, we can determine whether P(k,w) € TZ, for all P(k,w), in 0{KW), 
using Algorithm 7. 

We make another observation relating to Each G 

has a unique representation as a path A° = (/rg, pi, ■. ■, Pn), consisting of all the 
hypergeometric configurations /x„ that we encounter when we go over all cutoffs 
n for v^. Using our (fc, w)-indexing scheme, all paths start with po = P(o,o) and 
end with p^ = p(k,w)- Fig. 2 shows the path representation of t^ex- We say that 
a path “crosses TZ" when at least one of its pn is in TZ. We can then express 
pmHG terms of path counts: 

mHG of paths that cross TZ 
^ total # of paths 

^ # of paths that don’t cross TZ 

total # of paths 

Using these observations and definitions, it is possible to calculate 
without explicitly calculating for each G _ Instead, we can rely 

on a dynamic programming approach to count the number of paths that don’t 
cross TZ. This approach relies on the observation that all paths that contain 
a certain configuration P(^k,w) (k > 0, w > 0) also contain either P(k-i,w) or 
P{k,w-i) (see Fig. 3). Let C(^k,w) represent the fraction of paths that contain 
l^{k,w) ■ 

= |{A0 e 

We then observe the following recurrence relation for C(^k,w)- 

K-k+l W-w+l 

C(k,^) - C(fe_l,„) 
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Figure 3: Counting the number of paths that don’t cross TZ using a dynamic 
programming approach [1]. All hypergeometric configurations are repre¬ 

sented on an {K -I- 1) x {W + 1) grid, as in Fig. 2. Two paths are shown, one of 
which crosses TZ (shown in gray). All paths containing /i( 4 , 5 ) also contain either 
M( 4 , 4 ) or M( 3 , 5 ) (blue arrows). 


Using this recurrence relation, it is straightforward to recursively calculate 
the fraction of paths that don’t cross TZ. We let m(k,w) represent the fraction 
of paths that don’t cross TZ, but contain the configuration 

= |{A° : e A°, and /r ^ for all fi G A°}| / 

We have: 

_J0, ifs“«o = 1.0 

( 0 , 0 ) otherwise 

Then we observe the following recurrence relation for the m(^k,w) (k>0, w>0): 




™ K—k-\-l _i_ ™ W 

m(k-l,w) +^(k,w-l) N-n+l 


if ^ 

otherwise 


If k=0, or w=0, the first or second term of the recurrence relation is omitted, 
respectively, for the case fi(k,w) ^ Algorithm 8 uses this relation to calculate 
^{K,w) in 0{KW), yielding 

mHG _ 1 

p _ i — 
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4 The extended mHG (XL-mHG) test 

In this section, I will first discuss two limitations of the mHG (in Section 4.1), 
in order to then motivate the definition of the XL-mHG test statistic, which 
involves two new parameters (see Section 4.2), as well as the XL-mHG p-value 
(see Section 4.3). In both cases, I will include a discussion on how to modify 
the efficient algorithms used in the implementation of the mHG (see Section 3), 
in order to accommodate the two new parameters. 

4.1 Limitations of the mHG test 

As previously discussed (see Section 2.3), abandoning the cutoff parameter n 
provides the mHG with an enormous advantage over simpler tests. However, the 
approach used by the mHG represents the “other extreme”, in the sense that the 
mHG does not exert any control over which cutoffs are tested for enrichment. 
In certain scenarios, this lack of control can turn into an “Achilles heel” and 
significantly reduce the usefulness of the test. 

For Scenario 1, imagine a relatively long list, (say, N=10,000), which has a 
very moderate enrichment (say, 1.5-fold) in the first half of the list. Fig. 4 shows 
the distribution of mHG p-values obtained for 1,000 simulations of this scenario, 
for K=500. As can be seen from the distribution, the p-values obtained in these 
simulations are highly statistically significant. This is not surprising, since even 
a relatively small fold enrichment of 1.5 is extremely unlikely to arise by chance 
given a large enough sample. Therefore, a very good (i.e., small) mHG test 
statistic = min„{p)(()} will be found at n « 5,000, and result in a highly 
significant However, in many applications, a slight overrepresentation 

of “I’s” among the first half of the list may not represent a very interesting 
enrichment signal, since weak enrichment among a large part of the list could 
be artifactual, e.g. arising from a small and potentially unknown bias present 
in the data. 

For Scenario 2, imagine a medium-sized list (say, N=1,000), with K=100 
I’s (i.e., “interesting” elements). Let us this time assume that there is no 
enrichment present at all (i.e., the I’s are randomly distributed), except for 
a few “outliers” at the top, which are randomly distributed among the first 20 
positions in the list. How many of such outliers k 2 o does it take for the mHG 
to yield a statistically significant Fig. 5 shows boxplots for ^20 = 1... 10, 

showing that for ^20 = 6, the majority of simulations result in a statistically 
significant Note that these positive test results are based on the high 

ranking of only 6/100 = 6% of all the I’s in the list. It should be noted here 
that this extreme sensitivity can be thought of as a key feature of the mHG. 
However, this amazing sensitivity simultaneously makes the mHG vulnerable 
to outliers. One way to address this problem would be to perform a manual 
“quality check” on positive test results that are based on only very few I’s at 
the top of the list. An alternative strategy, which is presented in this work, is 
to introduce an additional parameter that directly controls the tradeoff between 
the test’s sensitivity and robustness. 
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Figure 4: Assessing the sensitivity of the mHG to weak enrichment within 
the top 50% of a long list (Scenario 1). Shown is the distribution of mHG p- 
values obtained from 1,000 simulations of a long list (N=10,000), with K=500 
and a small enrichment (1.5-fold) among the first 5,000 elements. The gray line 
indicates a significance threshold of a = 0.01. 


0 


8 

6 


4 

2 

0 



1 


1 2 3 



4 5 6 7 

fc(20) (“Outliers”) 





9 10 


Figure 5: Assessing the robustness of the mHG to outliers (Scenario 2). Box 
plot showing the distributions of mHG p-values obtained from 1,000 simulations 
each for 1-10 “outliers” in lists with N=1,000 and K=100. The outliers are 
randomly distributed among the top 20 elements of the list, while the remaining 
I’s show no enrichment, i.e., they are randomly distributed across the entire list. 
The gray line indicates a significance threshold of a = 0.01. 


4.2 The XL-mHG test statistic 

In both of the scenarios described in Section 4.1, a better control over the cutoffs 
tested by the mHG could help overcome the limitations encountered: 

• In Scenario 1, the testing of very low cutoffs (large n) resulted in a positive 
test even though the enrichment pattern might be considered artifactual. 
To avoid this situation, we might want to limit the cutoffs tested to the 
first L ranks. For example, we might decide that the lowest cutoff at which 
we would expect to find meaningful enrichment corresponded to fV/4. This 
would significantly reduce the probability of obtaining a significant test 
result simply because of weak enrichment affecting the top 50% of the list. 
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• In Scenario 2, the high ranking of only 6% of the I’s (“outliers”) was suffi¬ 
cient to obtain a positive test result in the majority of cases, even though 
the remaining 94% of I’s exhibited no enrichment at all. To improve the 
robustness of our test, we might decide to ignore all cutoffs that have less 
than X I’s above them. In our example, had we required at least 15% 
(X=15) of I’s to be above the cutoff, this would have prevented the six 
outliers from generating a positive test result. 

We therefore introduce the XL-mHG test statistic which is a modifi¬ 

cation of 


s"^ = < ‘'n<L (XL-mHG test statistic) 

I 1 otherwise 

This statistic introduces two parameters, X and L {0 < L < N, X > 0), 
as proposed above, which provide a certain level of control over which cutoffs 
should be tested for enrichment^ All cutoffs with less than X I’s above them, 
as well as all cutoffs below L, are ignored. If there are less than X I’s above 
the lowest permissable cutoff L, we have no enrichment at all (s^l® = 1). We 
immediately observe that for X=0 and L=N, s” l'^ reduces to Therefore, 

the XL-mHG test is a generalization of the mHG test. 

Note that instead of X (or in addition to it?), we could also choose to 
introduce a parameter T, which, in analogy to L, would simply result in all 
cutoffs above T being ignored. However, we deliberately decide against this 
possibility, since for most applications, specifying X is much more intuitive 
than specifying T.^ 

To efficiently calculate we introduce a modification of Algorithm 6, 

with changes highlighted in magenta: 

^The L parameter was already discussed by Eden et al., who referred to it as rimax, and 
noted that “it is possible to devise appropriate bounds and algorithms for computing the 
accurate p-value” for such a parameter [ 1 ]. This is exactly what we are concerned with here. 

®An additional possibility is the testing of only certain “slices” of cutoffs, e.g., {Ti ... Li} U 
{r2...L2} U ... (Ti < Li, Li < Li+i). Calculating the corresponding test statistics and 
p-values would require modifications of the respective algorithms that are similar to those 
introduced here. 
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Algorithm 1: Calculate in 0{KN) 

Input: Y=v, N=N, K=K, X=X, L=L 
Output: s=s"^ 

1 k ^ 0 

2 s ^ 1.0 

3 F ^ Algorithm 5 (V, N, K) // calculate all N,K,n) 

4 for n = 0 to L-1 do 

5 if V[n] != 0 then //we hit a “1” 

6 k i — k -f 1 

7 if k > X then 

8 p •(— Algorithm 4 (F[n+1], k, N, K, n+1) // calculate 

9 s ^ min(s, p) 

10 end if 

11 end if 

12 end for 

13 return s 
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4.3 The XL-mHG p-value 

The XL-mHG p-value Px l° is defined analogously to the mHG p-value: Assume 
that we observe 3““° for ^ ranked binary list v. Let be a random variable 

representing the XL-mHG test statistic observed for a random permutation of 
V. Then: 


PxT = (XL-mHG p-value) 

The introduction of X and L results in the elimination of certain configu¬ 
rations from TZ: those with less than X I’s above the cutoff, and those with 
cutoffs greater than L. Let TZx^l be this restricted set. Then we can express 
TZx,l as follows: 

T^x.l \^p(k,w) • P{k^w) — Sx^L 7 ^ k W Z/} 

In other words, for a configuration ^i(k,w) to be in TZx,h, we do not only require 
its associated hypergeometric p-value P[k,w) to be at least as good as Sx,l^) but 
k and w must also fall within the limits defined by X and L. 

Alternatively, let us define TZ' = {p(k,w) ■ P(k,w) < Sx”'^}- We also define 
X = {pi(k,w) '■ k < X} and C = {p(k,w) ■ k + w > L}, representing the sets of 
configurations excluded by X and L, respectively. We can then express TZx,h in- 
terms of these sets (see Fig. 6): 

T^x.l = \ (A U £) 



Figure 6: Expressing TZx,l using TZ', A, and C. The (AT -|- 1) x {W l)-grid 
shows all configurations in as in Fig. 2. TZ' (red shaded area) is defined 

by s” l'^ for the example vector Uex, with X = 3 and L = 5. A, the set of all 
configurations excluded by X, is shown as the yellow shaded area. Similarly, £, 
the set of all configurations excluded by L, is shown as the gray shaded area. 
TZx.l is the subset of TZ' contained in neither A nor C. 
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It should now be clear that we can easily modify Algorithm 7 to find all 
configurations in 77 .x,l, as follows (changes highlighted in magenta): 


Algorithm 2: Determine whether H(k,w) G 77x,l, for all ia 0{KW) 

Input: s=s““° e (0; 1), K=K, W=W, X=X, L=L 

Output: Binary array R[0..K, 0..W], indicating whether H(k,w) & 77x,l- 

1 R ^ (K+l)x(W+l)-array of zeros 

2 N ^ K+W 

3 n ^ 1 

4 p_start -<—1.0 

5 while n < L do 

6 

7 //calculate 

8 if n < K then 

9 k ^ n 

10 // calculate /(n; N, K, n) from /(n — 1; N,K,n — 1) using Identity 4 

11 p_start ^ p_start * (K-n+l)/(N-n+l) 

12 else 

13 k ^ K 

14 // calculate f{K; N,K,n) from f{K; N,K,n— 1) using Identity 5 

15 p_start ^ p_start * n/(n-K) 

16 end if 

17 

18 // find lowest k for which ^i(k,w) G 77 

19 p ^ p_start 

20 pval ^ p_start 

21 w ^r- n-k 

22 while k > X and pval < s do 

23 // we’re still in 77 x.l 

24 R[k,w] •(— 1 

25 // calculate f{k — 1; N,K,n) from f(k; N,K,n) using Identity 6 

26 P ^ P * (k*(N-K-n+k)) / ((n-k+l)(K-k+I)) 

27 pval ^ pval + p 

28 k ^ k-1 

29 w ^ w+I 

30 

31 end while 

32 n ^ n+1 

33 end while 

34 return R 
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Finally, using the return value of Algorithm 2, we can use Algorithm 8 
virtually unchanged to calculate p” l'^ (changes highlighted in magenta): 


Algorithm 3: Calculate in 0[KW) 

Input: s=s““° G (0; 1), K^AT, W=hF, X=A, L=L 
Output: p=Px,L° 

1 R ^ Algorithm 2 (s, K, W, X, L) 

2 M (K+l)x(W+l)-array 

3 M[0,0] ^ 1.0 

4 N ^ K+W 

5 for n = 1 to N do 

6 k ^ min(n,K) 

7 w = n-k 

8 while k > 0 and w < W do 

9 if R[k,w] = 1 then 

10 M[k,w] ^ 0 

11 else if w > 0 and k > 0 then 

12 M[k,w] ^ M[k,w-1] * (W-w+l)/(N-n+l) + 

M[k-l,w] * (K-k+l)/(N-n+l) 

13 else if w > 0 then 

14 M[k,w] ^ M[k,w-1] * (W-w+l)/(N-n+l) 

15 else if k > 0 then 

16 M[k,w] ^ M[k-l,w] * (K-k+l)/(N-n+l) 

17 end if 

18 w ^ w + 1 

19 k ^ k - 1 

20 end while 

21 end for 

22 p ^ 1.0 - M[K,W] 

23 return p 
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5 Quantifying the strength of enrichment 


5.1 Motivation 


Sections 3 and 4 detail efficient algorithms to calculate the mHG and XL-mHG 
test statistics, respectively, as well as their associated p-values. When these 
quantities indicate the presence of significant enrichment, the next question 
becomes: How strong is the enrichment detected? In other words, how can 
we quantify the effect size of the enrichment, as opposed to its significance? 
In the case of the simple enrichment test that operated using a fixed cutoff 
(see Section 2.2), the answer to this question is easy: We can use the cutoff 
n to calculate a fold enrichment value e(„), representing the ratio between the 
observed and the expected number of I’s above the cutoff, where 

is easily calculated a.s K ^ {n/N)\ 


^(n) ^(n) 

^ " K*{n/N) 


(fold enrichment) 


How can we adapt this simple definition to estimate the strength of enrich¬ 
ment in the absence of a fixed cutoff? It would at first seem natural to define a 
“maximum fold enrichment”, in complete analogy to the mHG test statistic: 


ginax _ maxe(n) (max. fold enrichment) 

However, there is a clear problem with this approach: For small n, the behavior 
of e(„) is very erratic. To demonstrate this, recall the example Vg^- 

Vg^ = ( 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 )^ 


The values of e(„) for this ranked list are shown in Fig. 7. The largest fold 
enrichment is found for n = 1, where e(„) = 4.0. In fact, this value is the largest 
fold enrichment value any list v with = 20 and K = 5 can attain, since it 
corresponds to the situation where all of the elements above the cutoff consist of 
I’s. In particular, we have = 4.0 for any such ranked list which has a “1” 
as its first element. However, it would be rather useless to rely on a definition 
of enrichment that allows its value to be determined solely by the first element 
of the list. (Imagine this in a list with 1,000 elements!) Therefore, we should 
not use to quantify enrichment. 

Since maximizing the fold enrichment over all cutoffs does not seem to work 
well, we could instead try to devise a way of selecting a “special” cutoff n*, 
and then use e* = as an overall measure of the strength of enrichment. 

A simple choice of n* would be the the cutoff that determines the mHG test 
statistic: 


n* = arg min 

n 

For Vex) n* = 6, and therefore e* = e(6) ~ 2.7, which seems like a much more 
reasonable value than 4.0. In contrast to e* provides a useful estimate of 
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Figure 7: Quantifying the strenght of enrichment for the ranked binary list 
Uex- Each bar represents the fold enrichment e(„) at a given cutoff n. Bars 
highlighted in yellow correspond to positions in the list that have a “ 1 ” element. 


the strength of enrichment that is no longer influenced by the fluctuations in 
e(„) for small n. 

However, the definition of e* is still somewhat unsatisfactory, in the sense 
that basing our estimate on the fold enrichment value at a single cutoff seems 
unjustified. After all, a significant mHG enrichment test result does not imply 
that n* is the only cutoff above which an enrichment of I’s can be observed. 
In fact, in our example of Uex, we see that 6 ( 4 ) = 3.0, which is greater than 
6 ( 6 )! Assuming that we have already established the general significance of 
enrichment in Uex using the mHG test « 0.024), why should we ignore 

6 ( 4 ) in quantifying the strength of the enrichment for Uex? 

5.2 The mHG enrichment score 

The preceding discussion motivates us to find some middle ground in defining 
an mHG enrichment score 6 “”'^: We do not want to include all cutoffs in the 
calculation of 6 "'“'^, since for small n, the fold enrichment values are unreliable. 
Neither do we want to base on the fold enrichment at a single cutoff, since 
this seems unnecessarily restrictive. Instead, we observe that these two choices 
represent the two opposite extremes in a more general framework: Let us define 
a p-value threshold ijj > and let C(^/’) represent the set of cutoffs that are 

associated with hypergeometric p-values < ip: 
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Then, let (ip) be defined as follows: 

gmHG g (mHG enrichment score) 

n e C{ip) 

In order for the fold enrichment at a specific cutoff to be included in the 
calculation of the cutoff needs to be associated with a certain minimum 

significance of hypergeometric enrichment ip. In particular, we observe that for 
Ip = 1.0, we have {ip) = and for ip = we have pip) = 

e*. Generally speaking, smaller values of ip better protect {ip) against 
fluctuations in e(„) associated with small n, but they are also more likely to 
result in an overly conservative estimate of enrichment. Thus, the choice of ip 
determines the trade-off between robustness and accuracy in quantifying the 
strength of enrichment. 

5.3 The XL-mHG enrichment score 

X,1 

In analogy to the generalization from to s” we would like to adapt the 
preceding definition of for use in conjunction with the XL-mHG test. For 
this purpose, we require ip > and define Cx.l(V’) in analogy to C{ip), but 

restricted to the cutoffs permitted by X and L: 

Cx.l(V’) = {n: fc(„) >X,n<L, <ip} 

We then define (ip) in analogy to (ip): 

e” (ip) = max e(„) (XL-mHG enrichment score) 

neCx.LlV’) 

As for SxL*^, for X = 0 and L = N, we have exL““‘^ = Also, since 

Cx Lis only of interest when pj is considered significant (at a given signif¬ 
icance level a), ip can theoretically be set to a permissive value like 0.05, even 
if a. was chosen very conservatively (e.g., as a result of Bonferroni correction). 
However, any value ip > a > sjwill lead to a less conservatively biased 
estimate of enrichment than e*. 
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A Recurrence relations for the hypergeometric 
PMF 


In this section, we simply state several recurrence relations that are used in the 
mHG algorithm, while postponing the derivations to Appendix C. 

• Calculate /(fc + 1; N, K, n) from /(fc; iV, K, n): 


f{k+ 1; N,K,n) = 

f{k; N,K,n) 


(n — k){K — k) 


{k + l){N - K-n + k + l) 
See Appendix C.l for the derviation. 

Calculate /(fc; N,K,n + 1) from /(fc; N, K, n): 


(Identity 1) 


/(fc; N,K,n + l) = 

f{k-, N,K,n) (» + l)(^ ^ u 2 (Identity 2) 

[N — n){n — K + 1) 

See Appendix C.2 for the derviation. 

• Calculate f(k +1; N,K,n + 1) from f{k; N, K, n): 

f{k + l-, N,K,n + l) = f{k\ N, K, n) (Identity 3) 

See Appendix C.3 for the derviation. 

• Calculate /(n; N, K, n) from f(n — 1; N,K,n — 1): 

K — n + 1 

f{n -1; N,K,n-l) = f{n; N, K, n) — -— (Identity 4) 

A — n + 1 

This assumes n < K. See Appendix C.4 for the derviation. 

• Calculate f{K] N,K,n) from f{K\ N,K,n— 1): 


Ti 

f{K; N,K,n) = f{K- N,K,n- 1) ^_ ^ (Identity 5) 

See Appendix C.5 for the derviation. 

• Calculate f{k — 1; N, K, n) from /(fc; A, A, n): 


f{k - 1; A, K, n) = f{k; A, K, n) i) (Identity 6) 

See Appendix C.6 for the derviation. 
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B Algorithms 

B.l Efficient calculation of the mHG test statistic 


Algorithm 4: Calculate from f{k(^n)\ in 0{K) 

Input: i=f{k(y^] N,K,n), k=/c(„), N=A, K=A, n=n 
Output: p=pf^) 

1 p ^ f 

2 while k < min(n, K) do 

3 // calculate f(k + 1; N,K,n) from f(fc; N,K,n) using Identity 1 

4 f ^ f * (p((n-k)(K-k))/((k+l)(N-K-n+k+l))) 

5 p ^ p + f 

6 k ^ k + 1 

7 end while 

8 return p 


Algorithm 5: Calculate /(fc(„); N,K, 

n) for all n, in 0{N) 

Input: Y=v, N=N, K=K 


Output: F=f{kfn)] N,K,n) for all n = 

= 0...N 

1 

F[0] ^ 1.0 


2 

k ^ 0 


3 

for n = 0 to N-1 do 


4 

if V[n] = 0 then 


5 

// calculate f{k; N,K,n + l) i 

Tom /(fc; N,K,n) using Identity 2 

6 

F[n+1] = F[n] * ((n+l)*(N-K- 

n+k)) / ((N-n)(n-k+l)) 

7 

else 


8 

// calculate f{k + 1; N,K,n + 

1) from f{k; N,K,n) using Identity 3 

9 

F[n+1] = F[n] * ((n+l)*(K-k); 

) / ((N-n)*(k+l)) 

10 

k ^ k + 1 


11 

end if 


12 

end for 


13 

return F 
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Algorithm 6: Calculate in 

0{KN) 


Input: Y=v, N=A, K=K 
Output: s=s"“° 

1 k ^ 0 

2 s ^ 1.0 

3 F ^ Algorithm 5 (V, N, K) 

II calculate all /(fc(„); N,K,n) 

4 for n = 0 to N-1 do 

5 if V[n] != 0 then //we hit a “1” 


6 k i — k “h 1 

7 p Algorithm 4 (F[n+1], 

^ —1 
+ 

// calculate 

8 s ^ min(s, p) 

9 end if 

10 end for 

11 return s 
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B.2 Efficient calculation of the mHG p-value 


I first describe the algorithm to efficiently determine, for given K, W, and s““°, 
whether fJ,(k,w) S TZ, for all A hypergeometric configuration ^i{^k,w) is in TZ 

if the hypergeometric p-value associated with it is at least as “good” (i.e., 
equal to or smaller than) the observed mHG test statistic 5 "“'^ (see Section 3.2). 
Similarly to the approach chosen for calculating we avoid calculating 
“from scratch”, and rely on recurrence relations instead. 

Let = min{n, K}. At each cutoff n, the algorithm first uses a recurrence 
relation to calculate the hypergeometric p-value p(^k* yu-k* p for the configura¬ 
tion representing the strongest possible enrichment (see blue arrows in Fig. 8 ). If 
P(k* yu-k* j) ^ the p-value for the configuration with the next-lowest enrich¬ 
ment at cutoff n is calculated using another recurrence relation, until a p ^ 72. 
is found (see black arrows in Fig. 8 ). The algorithm stops once it finds an n for 
which p(j,. ^^n-k* j) is no longer in 72. 



Figure 8: Illustration of Algorithm 7. All hypergeometric configurations H(k,w) 
are represented on an (72 -|- I) x [W + 1) grid, as in Fig. 2. The red shaded 
region contains all configurations that are in 72. 
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Algorithm 7: Determine whether ^,(k,w) G for all in 0{KW) 

Input: s=s““o g (0; 1), K=K, W=W 

Output: Binary array R[0..K, 0..W], indicating whether fi(k,w) G 

1 R ^ (K+l)x(W+l)-array of zeros 

2 N ^ K+W 

3 n ^ 1 

4 p_start ^1.0 

5 while n < N do 

6 

7 //calculate 

8 if n < K then 

9 k ^ n 

10 // calculate /(n; N, K, n) from f{n — 1; N, K,n — 1) using Identity 4 

11 p_start ^ p_start * (K-n+l)/(N-n+l) 

12 else 

13 k ^ K 

14 // calculate f{K\ N,K,n) from f{K\ N,K,n — 1) using Identity 5 

15 p_start ^ p_start * n/(n-K) 

16 end if 

17 

18 // find lowest k for which ^i(k,w) G TZ 

19 p p_start 

20 pval •(— p_start 

21 w ^ n-k 

22 while pval < s do 

23 // we’re still in TZ 

24 R[k,w] 1 

25 // calculate fik — 1; N,K,n) from f(k-, N,K,n) using Identity 6 

26 p ^ p * (k*(N-K-n+k)) / ((n-k+I)(K-k+l)) 

27 pval ^ pval + p 

28 k ^ k-I 

29 w ^ w+1 

30 

31 end while 

32 n n+1 

33 end while 

34 return R 
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The final algorithm for calculating relies on Algorithm 7 to determine 
7^, and then determines the number of paths that do not cross TZ using a simple 
recurrence relation (see Section 3.2). 


Algorithm 8: Calculate in 0{KW) 

Input: s=s““‘^ S (0; 1), K=A, W=1T 
Output: p=p'"“'^ 

1 R ^ Algorithm 7 (s, K, W) 

2 M ^ (K+l)x(W+l)-array 

3 M[0,0] ^ 1.0 

4 N ^ K+W 

5 for n = 1 to N do 

6 k ^ min(n,K) 

7 w = n-k 

8 while k > 0 and w < W do 

9 if R[k,w] = 1 then 

10 M[k,w] ^ 0 

11 else if w > 0 and k > 0 then 

12 M[k,w] ^ M[k,w-1] * (W-w+l)/(N-n+l) + 

M[k-l,w] * (K-k+l)/(N-n+l) 

13 else if w > 0 then 

14 M[k,w] ^ M[k,w-1] * (W-w+l)/(N-n+l) 

15 else if k > 0 then 

16 M[k,w] ^ M[k-l,w] * (K-k+l)/(N-n+l) 

17 end if 

18 w ^ w + 1 

19 k ^ k - 1 

20 end while 

21 end for 

22 p ^ 1.0 - M[K,W] 

23 return p 
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C Derivations 

In these derivations, we omit terms that immediately cancel out because they 
appear identically in both enumerator and denominator. 

C.l Derivation of Identity 1 

Using the definition of the Hypergeometric PMF, we have: 


f{k + 1; N,K,n) 



( 2 ) 


Likewise, we have: 



( 3 ) 


By substituting (^) in (2) with (3), we then have: 




(n — k)\{N — K — n + k)\k\{K — k)\ 


( 4 ) 


C.2 Derivation of Identity 2 

Using the definition of the Hypergeometric PMF, we have: 


/(/c; N,K,n + 1) 



N-K 


( 5 ) 


Likewise, we have: 



(6) 


By substituting (^) in (5) with (6), we then have: 
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fik-,N,K,n + l) = f{k; N, K, n) 


= /(fc; ...) 

= fik] N,K,n) 


Vn+1/ V n—k } 

(n + 1)!(-/V — n — l)!(n — k)l{N — K — n + k)l 


n\{N — n)\{n — k + 1)!(-/V — K — n + k — 1)\ 
{n + 1)(A'' — K — n + k) 

(TV — n){n — k + 1) 


( 7 ) 


C.3 Derivation of Identity 3 

Using the definition of the Hypergeometric PMF, we have: 


f{k + 1; N, K,n + 1) = 


( K ) (N-K\ 
Vfe+l/ V n—k ) 

Vn+1/ 


Likewise, we have: 


( 8 ) 


N - K\ (^) 

= f{k; N,K,n)^ 

\k) 


— k 


(9) 


By substituting ( ) in (8) with (9), we then have: 


f{k + l- N,K,n + l) = fik; N,K,n) 


= /(fc; ...) 


= fik; N,K,n) 


Ur)(^) 

(n + 1)!(TV - n - l)!fc!(iV - fc)! 
n!(iV-n)!(fc + l)!(iV-fc-l)! 
(n + l)(iV — k) 


(TV — n)(fc + 1) 


( 10 ) 


C.4 Derivation of Identity 4 

We first derive the more general relation between /(fc; K, n) and fik — 1; N,K,n — 1), 
and then substitute k = n. By definition of the Hypergeometric PMF, we have: 


Likewise, we have: 


fik; N,K,n) 



( 11 ) 


35 















N -K' 
n — k 


= f{k - 1-, N,K,n- 1) 


L-^) 


By substituting ( ) in (11) with (12), we then have: 


/(fc; TV, K, n) = f{k - 1; N,K,n - 1) 

= f{k- 1; N,K,n- 1) 

= f{k-l;N,K,n-l) 


n\{N — n)!(fc — l)!(iV — fc + 1)! 
kliK - k)\in - iy.{N - n + 1)\ 
n{K — k + 1) 


k{N — n + 1) 
Then, substituting k = n (assuming n < K): 

f{n; N, K, n) = f{n - 1; TV, if, n - 1) 


K — n + 1 
N — n + 1 


( 12 ) 


(13) 


(14) 


C.5 Derivation of Identity 5 

We first derive the more general relation between f{k; N, K, n) and f{k; N,K,n — 1), 
and then substitute k = K. By definition of the Hypergeometric PMF, we have: 


/(fc; N,K,n) = 


tN\ 


Likewise, we have: 


TC\ ( ^ ') 


\n—k — l/ 


By substituting (/,) in (15) with (16), we then have: 

/ N N (N-K^ 

f{k; TV, if, n) = f{k; N,K,n- 


= /(fc; ...) 
= /(fc; ...) 


OL-l-,) 

n\{N — n)\{n — k — 1)!(TV — if — n + fc + 1)! 
(n — 1)!(-/V — n + l)!(n — k)\{N — K — n + k)\ 
n{N — K — n + k+1) 

{N — n + l)ln — k) 


Then, substituting k = K: 


/(if; N,K,n)=f{K- iV,if,n-l) 


n — K 


(15) 


(16) 


(17) 


(18) 
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C.6 Derivation of Identity 6 

Using the definition of the Hypergeometric PMF, we have: 


f{k- 1; N,K,n) 



N-K \ 
n—fc+1/ 



(19) 


Likewise, we have: 


f{k; N,K,n) 


( 20 ) 


By substituting in (19) with (20), we then have: 


(N-K\( k ^ 

f{k-l-,N,K,n) = f{k-, N, K, n) 


= /(fc; ...) 


= /(fc; N,K,n) 


(K\ (N-1 
\k)\ n—k ) 

kl{K - k)\{n - k)\{N -K-n + k)l 
{n — k + 1)!(-/V — K — n + k — iy.{K — A: + 1)! 
k{N — K — n + k) 


{n — k + 1){K — k + 1) 


( 21 ) 
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